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Abstract. We consider the numerical solution of the scattering of time-harmonic plane 
waves from an infinite periodic array of reflection or transmission obstacles in a homoge- 
neous background medium, in two dimensions. Boundary integral formulations are ideal 
since they reduce the problem to N unknowns on the obstacle boundary. However, for 
) complex geometries and/or higher frequencies the resulting dense linear system becomes 

large, ruling out dense direct methods, and often ill-conditioned (despite being 2nd-kind), 
rendering fast multipole-based iterative schemes also inefficient. We present an integral 
equation based solver with O(N) complexity, which handles such ill-conditioning, using 
recent advances in "fast" direct linear algebra to invert hierarchically the isolated obstacle 
matrix. This is combined with a recent periodizing scheme that is robust for all incident 
angles, including Wood's anomalies, based upon the free space Green's function kernel. 
■ The resulting solver is extremely efficient when multiple incident angles are needed, as 

£L|) occurs in many applications. Our numerical tests include a complicated obstacle several 

wavelengths in size, with N = 10 5 and solution error of 1CP 10 , where the solver is 66 times 
faster per incident angle than a fast multipole based iterative solution, and 600 times 
faster when incident angles are chosen to share Bloch phases. 
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>• ) 1. Introduction 

l> 

t^J- ■ Numerical modeling of the scattering of linear time-harmonic waves from materials with 

CN , periodic geometry plays a key role in modern optics, acoustics, signal processing, and 

antenna design. Periodic diffraction problems where numerical modeling is crucial include 
the design of gratings for high-power lasers [6] , thin-film solar cells [1] and absorbers [39J , 
process control in semiconductor lithography [3l] , linear water-wave scattering from pillars 
[37j . and radar sensing of ocean waves. In all such problems, an incident plane wave 
generates a scattered wave which (in the far field) takes the form of a finite sum of plane 
waves at known Bragg angles, whose amplitudes are desired. 

Several challenges arise in the efficient numerical solution of grating scattering problems: 
(i) the period of the gratings can be many wavelengths in size, (ii) in many applications 
(such as photovoltaic [I] or solar absorber design [39 j ) solutions arc needed at many incident 
angles and/or frequencies, (iii) the scatterer may have physical resonances in the form of 
guided modes (see Remark 12. ip . which leads to ill-conditioned problems, and (iv) so-called 
Wood's anomalies may occur, that is, scattering parameters (incident angle and frequency) 
for which one of the Bragg angles lies precisely along the grating. Note that challenges (iii) 
and (iv) are distinct: (iii) is a physical resonance leading to an ill-posed problem (see the 
reviews [HI [3D]), whereas Wood's anomalies do not cause ill-posedness — and yet they do 
cause problems for many numerical schemes. 

In most problems of interest, gratings consist of homogeneous media delineated by sharp 
interfaces; hence the corresponding partial differential equations (PDEs) have piecewise- 
constant coefficients. This manuscript focuses on the following two-dimensional Dirichlet 
boundary value problem, which models acoustics with sound-soft obstacles, or electromag- 
netic scattering in TM (transverse magnetic) polarization from perfect electric conductors 
(where u represents the out-of-plane electric field [5]). We seek the scattered wave u which 

l 



2 



A. GILLMAN, A. BARNETT 



solves, 

(A + u 2 )u(x) = x:= (x,y) eR 2 \Th 

(1.1) n(x) = -u\x) x e 

u radiative as y — > ±oo , 

where f2 C M 2 is a single bounded obstacle which is repeated to form an infinite grating of 
obstacles (denoted by 0g) of period d along the x-axis (see Fig. 11.1( a)). A plane wave with 
frequency uj, angle 9 G (— ir, 0), and wavevector k = (k 1 , fc 1 ) = (ojcos#,o;sin#) is incident; 
hence u : (x) = e lkx outside the obstacles and vanishes inside. It is easy to see from (II. ip 
that the total field u l = u 1 + u vanishes on dQ: this is the physical boundary condition. 
For the exact radiation condition on the scattered field u see (|2.5p - (|2.6p . The incident field, 
and hence the scattered field, satisfy a quasi-periodicity condition, 

(1.2) u(x + d,y) = au(x,y) V(i,i/) £R 2 , 

where a := e ll ^ d = e tu>dcose is known as the Bloch phase. We will also consider the 
transmission problem corresponding to dielectric obstacles (see section [5]) . 

Integral equation formulations are a natural choice for problems with piecewise-constant 
coefficients such as (II. lj) : they exploit this fact by reducing the problem to an integral 
equation whose discretization involves N unknowns living on the boundary alone. This has 
many advantages over finite elements or finite difference schemes: it reduces the dimension- 
ality by one, allowing for more complicated boundaries to be solved to high accuracy with 
a small N, and is amenable to easily implemented high-order quadratures. 

In the grating application, the infinite boundary must be reduced to the boundary 
of a single obstacle <9f2; there are two main approaches to this task of "periodizing" the 
integral equations. 

The first approach replaces the Green's function (fundamental solution) that appears in 
the kernel of in the integral operators by its quasi-periodized version [30] (which satisfies 
(jl.2p ). This is used in two dimensions by Bruno-Haslam [10] and in three by Nicholas |36j 
and Arens [2]. However, this fails as parameters approach a Wood's anomaly because the 
Green's function does not exist there. One cure (in the case of a connected interface) is 
to use the quasi-periodic impedance half-space Green's function \S2\ [3]. However, all such 
periodized kernel methods do not scale well to large TV" because dense matrices must be 
filled at a cost of order a millisecond per element |26j . 

The second approach, which is robust at all parameters including at or near Wood's 
anomalies, is to return to the free-space Green's function and instead periodize using a 
small number of additional unknowns on artificial unit-cell walls. The condition (|1 .2j) is 
then imposed directly in the linear system. This is used by Wu-Lu [33], and also by one 
of the authors in [5]. Further advantages of the second approach include: the low-rank 
nature of the periodizing is explicit (no dense matrices of periodized Green's evaluations 
are needed), and that free-space boundary integral equation quadratures and codes may be 
used without modification. 

In this paper, we present a fast direct solver for the formulation in [5], that can handle, in 
reasonable computation times, complicated boundaries that both demand large N and cause 
ill-conditioning. Section [2] reviews the derivation of the periodized integral formulation in 
the Dirichlet setting (II. ip . 

Like many boundary integral equations for two-dimensional domains, this gives a lin- 
ear system that could be solved via an iterative solver (e.g. GMRES) coupled with a fast 
matrix- vector multiplication scheme such as the fast multipole method (FMM) [12] . Unfor- 
tunately, when the system is ill-conditioned (which often occurs for complicated geometry) 
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it can take hundreds of iterations for such an iterative solver to converge, or worse, it may 
never converge to an acceptable accuracy. Additionally, iterative methods are not able to 
efficiently solve problems with multiple right hand sides that occur when the response is 
needed at multiple incident angles. This has motivated in recent years the development 
of a collection of fast direct solvers. These solvers utilize internal structure (essentially 
low-rank off-diagonal blocks) to construct rapidly an approximate inverse of the dense ma- 
trices resulting from the the discretization of integral equations. For many problems, the 
computational cost is (9(iVlog fe N) with typically fc = 0, 1, or2|221l21EIlEll331l32j, which 
can be orders of magnitude smaller than traditional 0(N 3 ) dense direct methods. 

In addition to being fast, the solvers are robust and construct the approximate inverse 
in such a way that it can be applied in with O(N) (with small constant) computational 
cost. In this work, we choose a Hierarchically Block Separable (HBS) solver |17] , Like 
the method in [25], the solver, described in section HI exploits potential theory to create 
low-rank factorizations. As a result, the computational cost scales linearly with the number 
of discretization points for low-frequency problems on many domains. 

Upon discretization, the integral formulation of the grating scattering problem in section 
[2] gives a two- by- two block linear system. Section [3] presents an efficient technique for 
solving this system with multiple right-hand sides, assuming an inverse of the large N x N 
block is available. The full fast direct scheme for the problem is then achieved by using 
the HBS solver of section H] to compute and apply this large matrix inverse whenever it is 
needed in the technique of section [3l 

We describe how our technique may be adapted to the transmission scattering problem 
in section [5j Section [6] illustrates the performance of the fast direct solver in some test 
cases, and compares the computation time to a standard fast iterative scheme. Finally, we 
draw some conclusions in section [7J 



2. Integral formulation for the quasi-periodic Dirichlet problem 

In this section, we describe an integral equation formulation for the grating scattering 
problem (II. ID that is based upon [5]. First we rephrase this boundary- value problem as an 
equivalent one on the unit cell containing a single obstacle. 

2.1. Restriction to a problem on a single unit cell. We assume that the copies of Q 
do not intersect and that we can create a unit cell "strip" U := {(x,y) : \x\ < d/2,y € R} 
containing the closure of £1. Let the infinite bounding left and right walls be denoted by 
L and R respectively (see Figure [TTTT b)). As in [8], the boundary value problem can be 
rephrased as an equivalent problem on the domain U, that is 



(2.1) 
(2.2) 




2 








u 



-u' 



on d£l . 



Quasi-periodicity of the field is imposed via wall matching conditions 



(2.3) 
(2.4) 



u\l — a u\r = 
Uu\l - cr l u n \n = 



where the normal derivatives on L, R are taken to be in the positive x direction. The 
total field u satisfies the outgoing radiation condition [8] , which is to say that it is given by 
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Figure 1.1. (a) Geometry of the periodic scattering problem, (b) Geom- 
etry of the periodic scattering problem with artificial unit-cell walls L and 
it! introduced, (c) Quadrature scheme on the Sommerfeld contour in the 
complex Fourier plane (wiggly lines show branch cuts of the integrand). 

uniformly-convergent Ray leigh-B loch expansions, 

(2.5) u{x,y) = J2c n e iKnX e ikn{v ~ m) V > Vo, \x\ < d/2 

(2.6) u(x,y) = J2 d n eiKnXeikn( ~ y ~ yo) V < -Vo, N < d/2 

where yo > su P( x ,y)eQ \v\i so that ft lies within the vertical bounds \y\ < yo, the ^-component 
of the nth mode wavevector is K n := k 1 + 2nn/d, and the y-component is k n := +\/lo 2 — k^- 
The sign of the square-root is taken as non-negative real or positive imaginary. The coeffi- 
cients c n and d n for orders n that are propagating (|K n | <ui) are the desired amplitudes of 
the Bragg orders mentioned in the Introduction. 

The above radiation condition also completes the precise description of the scattering 
problem (jl.ip . We now make a remark about well-posedness of (jl.ip . which of course 
equally well applies to the single unit cell version presented above. 

Remark 2.1. With the radiation condition (I2.5p - (l2.6p . at least one solution exists to (jl.ip 
at all scattering parameters (angles 9 and frequencies u) JSl Thm. 3.2]. Furthermore, at each 
angle, the solution is unique except possibly at a discrete set of frequencies which correspond 
to physical guided modes. The only such modes accessible in the scattering setting are 
embedded in the continuous spectrum [H]. [HdT] give conditions for nonexistence of such 
modes are given. These results will also hold for the transmission version where (|2.ip - (|2.2p 
are replaced by (|5.ip - (|5.4p below. 
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2.2. An indirect boundary integral equation formulation. Recall the single- and 
double-layer potentials [15] on an obstacle boundary d£l, defined as 

(2.7) (S dU a)(x)= f G(x,y)a(y)ds y , 

Jan 



and 



(2-8) (O 80 r)(x)=/ -— (x,y)r(y)d ay , 

Jdn dn{y) 

respectively. The kernel G(x,y) := <5(x — y) is the free-space Green's function for the 
Helmholtz equation at frequency to, 

(2.9) g(x) = ^ 1) (^|x|), xel 2 \{0}, 

where is the outgoing Hankel function of order zero. 

For the purposes of periodizing, auxiliary layer potentials on the L and R walls are 
needed. To handle their infinite extent, we switch from coordinate y to the Fourier variable 
k, via 

-i roo roo 

(2.10) §(*>) = — e ~ iky s(y)dy, g(y)= e ik v g (k)dk, 

^ 7r J -oo J -oo 

and make use of the spectral representation of the free-space Green's function [35j Ch. 7.2], 



„■ roc %\J uj 2 —k 2 \x\ 

(2.11) 0(x) = - / e*» -=== dk, x = (x, y) 

Square-roots are again taken non-negative real or positive imaginary, achieved by taking 
the branch cuts of the function \JuP- — k 2 in the k plane as (— oo, — u>) U (oj, +00) along the 
real axis and using a so-called Sommerfeld contour [TJ] for the integration passing from the 
2nd to 4th quadrants (Fig 11.1( c)). Inserting ()2.11|) into the usual expressions for single- 
and double-layer potentials living on a vertical wall W = {(xQ,y) : y G M} (W will be L or 
R, where xq takes the values — d/2 or d/2 respectively), we get "Fourier layer potentials" 

(2.12) GV£)(x) = \ r^y e iV^\,-x oj J_ mdk , 

2 J-oo \/uj 2 - k 2 

(2.13) (T) w Z)(x) = S[ % n ( x - x ^ r e iky e iV^^\x- X0 \^ k) dk 

Here fi and i) are interpreted as Fourier-transformed layer densities, or the coefficients of a 
plane-wave representation. 

We use a standard combined-field representation (to avoid spurious interior obstacle 
resonances [15]) with density r\ on the obstacle boundary, and auxiliary densities £ = [jl; v\ 
which will represent fields due to the remaining lattice of obstacles in the grating, thus 

(2.14) u = (V an - iuS gn )v + ($l + aS R )fi + (V L + aV R )v in U\U 

See Fig. 11.1( b). Imposing the Dirichlet boundary condition in (jl.ip whilst the imposing 
quasi-periodicity at the walls f|2.3j) — (|2.4j) results in the first and second rows, respectively, 
of the following 2-by-2 linear operator system 



'A 


B 




~v 






6 


Q. 




I 








(2.15) 

where the four operators A, B, C and Q are defined in the remainder of this section. Q 



^Note that operators with the symbol A involve Fourier variables, a notation consistent with [5] . 



6 



A. GILLMAN, A. BARNETT 



Using jump relations [15] . we have A = 1/2 + D — iujS where S, D : C(d£l) — > C{d£l) are 
the boundary operators with the kernels of (|2.T|) and (|2.8p . respectively. 

The operator B gives the effect of the auxiliary densities on the obstacle boundary value. 
By restricting ([2TT2l) - ([2TT5D and (jmj) to dtt, 



(2.16) 



B = [Son t L + aSon^R, Vqq^ + aT>dn,R] 



where Sg^^w an d T^dil,W denote the operators resulting from restricting (|2.12p and (|2.13p 
to evaluation on dVt. 

Following [5], we impose the second row of (|2.15p in Fourier space to enable an efficient 
discretization with spectral accuracy. Via (|2.7p . (j2.8|) and (|2.1ip . the operators mapping 
single- and double-layer densities on d£l to their Fourier values on a wall W are defined by 

i\JiJ 2 ~ k 2 \x—xq\ 



w,an 



and 



a)(k) 
1 

4tt 



i 

4-7T 



-iky 



Oil 



k 2 



cr (y)ds y , y = (x, y) £ <9ft 



^—iky^L\JtJ 2 —k 2 \x— xq\ 



on 



sign(x - x ), 



k 2 



■ n(y)r(y)c?Sj 



Similarly, the following two operators which instead map to Fourier normal derivatives on 
the wall W are defined by 

1 



(D 



W,aci a )(^) 



(T^ m r)(k) = — 



4?r 
i 

4?r 



e ~iky e i\/w 2 -k 2 |a 



g— iky^i^Ju 2 — k 2 |a 



Oil 



~ x o\ sig n (x _ X q) a{y)ds y 



k 2 , -k sign(x - x )) • n(y) r(y)ds y 



These four formulae allow us to express the operator C in (I2.15p . which gives the effect of 
the density rj on the Fourier transforms of the wall quasi-periodicity conditions (I2.3p - (|2.4p . 

as 



(2.17) 



C 



u L,dn 
rpA 

1 L,an ' 



- iujS L,au 
iUjD L,an 



1 ( D R,on 



Finally, Q maps Fourier wall densities to Fourier wall quasi-periodicity conditions. Thus, 
by translational invariance, each of its four blocks must be a pure multiplication operator 
in k. The Fourier coefficients of (l2TT2l) - ([2TT3]) in (pT4"D and ([^HllII) give, 

„i\/ u) 2 — k 2 d 



(2.18) 



Q 



i o 

/ 



+ 



i(ct 



a 



k 2 



a + a 



i(a 



-a 



a 



a 



k 2 



Remark 2.2. Once the operator system (|2.15p is solved for r] and £, the desired Bragg 
amplitudes c n and d n appearing in (|2.5p - (|2.6p are extracted by evaluating the scattered 
field u in (|2.14j) . and its y-derivative, at typically 20 equi-spaced samples along the lines 
{(x,y) : \x\ < d/2,y = ±yo}> then applying the discrete Fourier transform (e.g. via the 
FFT). See [5] for more details. 

2.3. Discretizing the integral equations. This section describes a discretization of the 
linear operator system (j2.15[) with high-order accuracy to obtain a finite-sized linear system. 
We discretize all operator blocks of E via the Nystrom method [29] . Since the A operator 
has a kernel with a logarithmic singularity at the diagonal, it requires special quadrature 
corrections. For this, we choose the 6th-order Kapur-Rokhlin rule [27] : however, the fast di- 
rect solver is compatible with other recently-developed local-correction quadrature schemes 
such as that of Alpert [I] , that of Helsing [24] , generalized Gaussian [23] , or QBX [28] . 
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The boundary dO, is parameterized by the smooth 2-7r-periodic function z : [0, 2ir) — > 9f2, 
then discretized using the iV-point global periodic trapezoid rule with nodes = z(2irj/N), 
j = 1, . . . , N. Then the N-bj-N matrix A represents the operator A. The Kapur-Rokhlin 
scheme modifies the weights (but not the nodes) near the diagonal, giving the matrix entries 

A nm = (2Tr/N)R ]n _ m] [dG(y n ,y m )/dn(y m ) - iuG(y n ,y m )]\z'(2^m/N)\. 

The values Rj are given in terms of 7j from the left-center block of |27l Table 6], as follows: 
Ro = 0, while Rj = Rn-j = 1 + 7j + 7-j for 1 < j < 6, and Rj = 1 otherwise. 

There is some freedom in choosing a contour for the Sommerfeld ^-integrals in (|2.12p - 
(|2.13j) . We choose a hyperbolic tangent curve of height O(l) and use the trapezoid rule in 
the real part of k with spacing h, truncated to maximum real part K (see Fig. ll.l( b) and [5]). 
K is chosen such that the integrand is exponentially small beyond this real part. There are 
M = 2K/h nodes kj € C for j = 1, . . . ,M. Since the scheme is exponentially convergent, 
M is typically only 100 for full machine precision when uj is around 10. Note that since 
there are two types of auxiliary densities, there are 2M unknowns in the periodizing scheme. 
Using the above quadrature nodes and weights, B and C are simply Nystrom discretizations 
of (|2.16p and (|2.17p . while Q has four diagonal sub-blocks with entries given by (|2.18p 
evaluated at the Sommerfeld nodes. The entire square block system in N + 2M unknowns 
is written (we drop the A symbols from now on for simplicity), 



(2.19) 



" A 


B " 




V 




" b ' 


C 


Q 




. 1 _ 








where b is the vector of values of — u 1 at the nodes on d£l. 

Remark 2.3. Once r\ and £ have been solved for, the desired Bragg amplitudes may be 
computed as in Remark 12.21 evaluating (|2.14p using the underlying Nystrom quadratures 
on dVt and in the Fourier k variable. 

2.4. Improving the convergence rate, and robustness near Wood's anomalies. In 

this section, we briefly sketch two ways to improve the convergence rate and robustness of 
the integral formulation (see [5] for full details). 

Firstly, we include nearest neighbor images of the obstacle in the representation, replacing 
flgJlD with 

P 

(2.20) u = ^2 a j (D an+ j d - iuS d n+jd)ri + (S L + aS R )fi + (V L + aV R )u in U\Q 
j= -P 

where d = (d, 0) is the lattice vector and P is the number of nearest neighbors on either 
side of f2 to be included. Since the auxiliary periodizing wall densities have to represent 
fields whose nearest singularities are now further away, this improves the convergence rate 
with respect to M. We have found that P = 1 or P = 2 is optimal. As a result, A will 
now contain not only a self-interaction of d£l, but interactions from the boundaries of 2P 
obstacles neighboring Q. 

Recall that A is already defined as the matrix from Nystrom discretization of the self- 
interaction operator A = 1/2 + Dqq qq — iujSqq^qq, where Dy t w and Sy,w represent the 
integral operators with single- and double-layer kernels acting from source curve W to target 
curve V. For j € Z\{0}, let Aj denote the matrix resulting from Nystrom discretization of 
the operator Aj = -Dan,an+jd — ^Sgn^n+jd- This expresses the effect on dQ of a neighbor 
j obstacles from <9f2. Let 

p 

A = A+ ^2 aj N- 
j= -P 
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Then, with this new representation, the block linear system (|2.19p is replaced by 



(2.21) 



A 


B 




V 




b 


C 


Q _ 




. € _ 








where C is the Nystrom discretization of 

q = aP ( D L,dn+Pd ~ tUjS L,dn+Pd) - a [P+1 ^( D R,an-(p+i)d~ tUjS R,dQ~{p+i)d S ) 
°- F ( T L,dn+Pd - iu)D L,dn+Pd) ~ a (P+1) ( r R,on-(p+i)d ~~ lujD iCdn-{p+i)d)_ 

which results from the new representation via some cancellations as in [5]. 

The scheme as presented so far would fail as one approaches a Wood's anomaly, since the 
auxiliary densities ft(k) and v{k) contain poles at k = ±k n , and as any k n approaches the 
origin this makes the Sommerfeld quadrature highly inaccurate [5j Sec. 5]. Thus, the second 
way to improve the scheme is to, in this situation, displace the Sommerfeld contour by an 
0(1) real value such that it is far from all poles =t/c n . This causes an incorrect radiation 
condition for the Rayleigh-Bloch mode whose pole the contour crossed. However, this is 
fixed simply by including an unknown coefficient for this mode in the representation (12.201) , 
and imposing the one extra linear condition that the radiation condition (12. 6p be correct. 
The elements of the matrix row which imposes the condition that there be no incoming 
radiation in the relevant mode are computed as in Remark 12.31 

The net effect is that, by expanding the system (|2.19p or (|2.2ip by one extra row and 
column, parameters at or near Wood's anomalies are also handled to machine precision. 
This robustness distinguishes the scheme from many other integral-equation-based solvers. 



3. A DIRECT SOLUTION TECHNIQUE 

This section presents techniques for solving the discretized linear system (|2.2ip in the 
case of multiple incident angles, using only a single inversion of the matrix A. Coupling 
these techniques with the method described in section [5] and [18] will result in a fast direct 
solver for the Dirichlet scattering problem (jl.ip . 

For simplicity, we start with the block solution technique assuming no contributions from 
neighboring obstacles, i.e. P = 0. Then we describe how to handle P > by exploiting 
the internal structure of the matrices characterizing the interaction between dO, and its IP 
nearest neighbors. 

3.1. The block solve (case P = 0). When there are no neighbor contributions, the 2x2 
linear system (|2.19p has solutions t] and £ given by 

. £= (Q-CA^B) - ^-^ 

(3.1) v ; 

rj = A~ 1 b - A _1 B£. 

Their computation involves inverting two matrices A and (Q — CA _1 B). Of course this 
assumes that A is invertible; however, this is known to hold for sufficiently large N since 
it is the Nystrom discretization of the injective operator 1/2 + D — iuiS arising in the 
scattering from an isolated obstacle |16} p. 48]. Since the size of Q is much less than N 
(typically M ~ 100), the cost of the solve is dominated by the inversion of the N xN matrix 
A. Fortunately, A has internal structure which makes it amenable to an O(N) inversion 
technique described in section HI Notice that while B, C and Q do depend upon the incident 
angle via their dependence on a, the matrix A does not. Thus A may be inverted once and 
for all at each frequency u). 



A FAST DIRECT SOLVER FOR QUASI-PERIODIC SCATTERING PROBLEMS 



9 



Remark 3.1. At a fixed frequency cu, there may be multiple incident angles 6 that share 
a common Bloch phase a via the relation a = e lujdcosS . Let q be the number of incident 
angles that share an a. It is then easy to check that, within ±1, we have q ~ ud/ir, so 
that q is proportional to the grating period in wavelengths. Notice that the number of 
matrix- vector multiplies with A" 1 required to evaluate (|3.ip is then 2M + q, and that A -1 
need only be accessed twice. We will later exploit this fact for efficiency. 

3.2. The block solve with neighboring contributions. Recall from section [2~4l that, 
when contributions from neighboring obstacles are included in the integral formulation, the 
matrix A in (13. If) is replaced by 

p 

A = A + ^2 a j Aj- 

3=-P, 

In practice, we find P = 1 or 2 is best as P > 2 gives little additional accuracy. 

Because d£l is separated from its neighbors, the matrices Aj are low rank (i.e. Aj has 
rank I where / <C N). Thus they admit a factorization Aj = LjRj where Lj and Rj are of 
size N x I. This means that the matrix that needs to be inverted is A + LR where 

L = [a _p l p\ ■ ■ ■ \a~ l \ i|al_i| • • • |a p Lp] 

and 

R = [R^pl ■ ■ ■ |R^! il^Tl • • • |Rp] 

The inverse of A = A + LR can be computed using only the inverse of A via the Woodbury 
formula [20j 

(3.2) A" 1 = (A + LR)- 1 = A" 1 + A~ X L (I + RA^L) -1 RA" 1 . 

Note that the square matrix (l + RA _1 L) is only of size 2PI thus can easily be inverted 
with dense linear algebra. Moreover, in practice we do not actually compute A -1 . Instead 
we apply A" 1 via (I3.2p which requires two applications of A -1 . For example, to multiply 
A -1 by a vector x, we need to evaluate A _1 L and A~ 1 x. For large N, this is done via the 
techniques described in section HI 

Instead of using a QR factorization to find Lj and Rj, we choose to use an interpolatory 
decomposition [211 CEH] defined as follows. 

Definition 3.1. The interpolatory decomposition of a m x n matrix M that has rank / is 
the factorization 

M = PM(J(1 : /),:) 

where J is a vector of integers ji such 1 < jj < m, and P is a ra x I matrix that contains a 
/ x I identity matrix. Namely, P(J(1 : I), ■) = I/- 

Since the Lj and Rj matrices do not involve a, they need only be computed once at each 
frequency, independent of the number of incident angles. However, the cost of computing 
these factorizations using general linear algebraic techniques such as QR is 0(N 2 l). This 
would negate the substantial savings obtained by using the O(N) inversion technique for 
A to be described in section HI 

To restore the complexity, we use ideas from potential theory. First, a circle of radius 
approximately twice the typical radius of is placed concentric with it. From potential 
theory, we know that any field generated by sources outside of this circle can be approx- 
imated arbitrarily well by placing enough equivalent charges on the circle. In practice, it 
is sufficient to place a small number of "proxy" points spaced evenly on the circle. For the 
relatively low frequencies (i.e. small uj) tested in this paper, we have found it is enough to 
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have 75 proxy points. We call all the points on the neighboring obstacles that are within 
the proxy circle near points. 

Instead of computing the interpolatory decomposition of each Aj, one matrix P is gen- 
erated by computing the interpolatory decomposition of the matrix 



where J near corresponds to the indices of the points on 90 + jd that are near 90, and 
A proxy is a matrix that characterizes the interaction between the nodes on 90 and the 
proxy points. We call the points on 90 picked by the interpolatory decomposition skeleton 
points. Figure [3~T1 illustrates the proxy points, near points and skeleton points for a sample 
domain. This P can be used for all IP nearest neighbors. For each j, the Rj is simply given 



Remark 3.2. When A is replaced by A in (|3.ip . and the Woodbury formula used for A, the 
number of matrix- vector multiplies with A" 1 required to evaluate ()3.ip is then 2M + q + 2Pl. 



Figure 3.1. Illustration of proxy points (x) and the skeleton points (o). 
The near points are the points inside the proxy circle not belonging to O. 

4. Creating a compressed inverse of the matrix A 

While the matrix A is dense, it has a structure that we call Hierarchically Block Separable 
(HBS) which allows for an approximation of its inverse to computed rapidly. Loosely 
speaking, its off-diagonal blocks are low rank. This arises because A is the discretization 
on a curve of an integral operator with smooth kernel (when co is not too large). This 
section briefly describes the HBS property and how it can be exploited to rapidly construct 
an approximate inverse of a matrix. For additional details see [TTj- Note that the HBS 
property is very similar to the concept of Hierarchically Semi- Separable (HSS) matrices 



4.1. Block separable. Let M be an mp x mp matrix that is blocked into p x p blocks, 
each of size m x m. 

We say that M is "block separable" with "block-rank" k if for r = 1, 2, . . . , p, there 
exist n x k matrices U r and V T such that each off-diagonal block M CT)T of M admits the 
factorization 



[A i (:,/ near )|^ proxy )] 



by R^A.O/a :/),:). 




nam]. 



(4.1) 



U CT M CTiT V*, 



a,r G {1, 2, ... 



p}, a^r. 



m x m 



m x k k x k k x m 
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Observe that the columns of must form a basis for the columns of all off-diagonal 
blocks in row a, and analogously, the columns of V r must form a basis for the rows in 
all the off-diagonal blocks in column r. When (|4.ip holds, the matrix M admits a block 
factorization 

V* + D, 

kp x mp mp x mp 

.,V P ), D = diag(Di, D 2 , D p ), 






M12 


M13 


M21 





M23 


M31 


M32 






Once the matrix M has put into block separable form, its inverse is given by 



(4.3) M" 1 = E (M + D)" 1 F* + G, 
where 

(4.4) D = (V* D -1 U)~\ 

(4.5) E=D _1 UD, 

(4.6) F=(DV*D -1 )*, 

(4.7) G= D- 1 -D~ 1 UDV*D- 1 . 



4.2. Hierarchically Block-Separable. Informally speaking, a matrix M is Hierarchically 
Block-Separable (HBS), if it is amenable to a telescoping version of the above block factor- 
ization. In other words, in addition to the matrix M being block separable, so is M once it 
has been reblocked to form a matrix with p/2 x p/2 blocks, and one is able to repeat the 
process in this fashion multiple times. 

For example, a "3 level" factorization of M is 

(4.8) M = U< 3 > (U< 2 > (U« M<°> (V«)* + B«) (V< 2 >)* + B^) (V®)* + , 

where the superscript denotes the level. 

The HBS representation of an N x N matrix requires O(Nk) to store and to apply 
to a vector. By recursively applying formula (|4.3p to the telescoping factorization, an 
approximation of the inverse can be computed with 0(Nk 2 ) computational cost; see |17j . 
This compressed inverse can be applied to a vector (or a matrix) very rapidly. Note that 
for memory movement reasons, it is more efficient to apply A -1 to a block of vectors than 
to each vector sequentially. 

5. Transmission problem 

So far this paper has focused on the Dirichlet problem. However, all of the techniques 
presented here also apply to the transmission problem, with minor modifications that we 
describe briefly in this section. 



(4.2) M = U M 

mp x mp mp x kp kp x kp 

where 

U = diag(Ui, U 2 , • • • , U p ), V = diag(Vi, V 2 , . . 
and 
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5.1. The boundary value problem. This section presents the unit-cell version of the 
boundary value problem analogous to section 12.11 

Let all obstacles in the grating have refractive index n, and the background index be 1. 
Then the partial differential equation analogous to (|2.ip is 



(5.1) (A + nV)« = inn 

(5.2) (A + co 2 )u = inU\Tl 
with matching conditions on the boundary analogous to (|2.2p . 

(5.3) u + — u~ = —u 1 on dO, 

(5.4) Un ~ u n = -«« on 9^ ■ 



The quasi-periodicity and radiation conditions are the same as for the Dirichlet case. Prob- 
lems of this kind correspond to the acoustic scattering from obstacles with a constant wave 
speed differing from the background value, or electromagnetic scattering from a dielectric 
grating in TM polarization. Remark 12. II also applies. 

5.2. The integral formulation. The transmission obstacle case is handled via an integral 
formulation often called the Miiller-Rokhlin scheme [38] : a pair of unknown densities r\ = 
[a; t] (where a is single-layer and r a double-layer) is used on <9f2, and these densities also 
generate potentials inside Q with the interior wavenumber nw. The resulting operator A 
turns out to be a compact perturbation of the 2-by-2 identity. Operators B and C become 
slightly more elaborate, whilst Q is unchanged. The complete recipe is found in [5]. 

5.3. The linear system. As with the Dirichlet problem, dQ is discretized via a Nystrom 
method with N points, and the Sommerfeld contour is discretized as before with M points. 
However, since there is a pair of densities a and r defined on the boundary (one of each for 
every point), the resulting linear system has 2N + 2M unknowns. In order for the matrix 
A to be amenable to the fast direct solver described in section [U it is crucial to interlace 
the unknowns so that nearby entries in the vector have nearby locations on dQ, namely 
rj = [o\] Ti; &2\ t~2\ ■ ■ ■ ; <Jn\ tjv]- This reorders the columns of A; the same reordering is also 
needed for the rows. 

6. Numerical examples 

In this section, we illustrate the performance of the proposed solution methodology for 
several different problems. Once the obstacle, period d, frequency lj, and set of desired 
incident angles are specified, the solver requires two steps: 

• Pre- computation: A compressed representation of the matrix A -1 is computed via 
the technique in section 01 If P > 0, the low-rank neighbor contributions Lj and Rj 
are computed as in section 13.21 This step is performed once. 

• Block solve: For each distinct Bloch phase a derived from the set of incident angles, 
a N-by-q right-hand side matrix is formed by stacking the right-hand sides b in 
(I2.19|) for the q incident angles that share this common a. The block solve formula 
(I3.1|) from Section 13.11 is then applied to this right-hand side matrix, at a cost of 
(q + 2M + 2PI) matrix-vector products with the compressed inverse. 

It is advantageous for the user to choose many incident angles that share a common a. 
This leads an overall efficiency gain of a factor q results (assuming q 2M + 2PI). 

All experiments are run on a Lenovo laptop computer with 8GB of RAM and a 2.6GHz 
Intel i5-2540M processor. The direct solver was run at a requested relative precision of 
lO" 10 . It was implemented rather crudely in MATLAB, which means that significant further 
gains in speed should be achievable. 
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Remark 6.1 (error measure). In the below, the solution error quoted is the "flux error", 
namely the absolute difference between the total incoming and outgoing flux. Since these 
fluxes should be equal, this provides a standard error measure in diffraction problems. It is 
computed via a weighted sum of the squared magnitudes of the Bragg amplitudes c n and 
d n in (|2.5p - (|2.6p ; see [5j Eq. (4.2)]. We have checked that this measure is similar to the 
pointwise error in the far field. 

6.1. Scaling with problem size. In this section, we apply the direct solver to both the 
Dirichlet and transmission scattering problems from a grating of star-shaped domains with 
period d = 1, and the frequency oj is fixed at 10. For the transmission case, the index 
is n = 1.5. We choose a single incident angle 9 = —ir/5. Hence, q = 1. The obstacle 
is defined by the parametrization z(t) = {fit) cos t, f{t) sint) where the radius function is 
f{t) = 0.35 + 0.105 cos(3t) for angle t £ [0,2vr). Figure O plots the total field for both 
cases. 



TO 

(a) (b) 

Figure 6.1. Illustration of the total field for star shaped obstacles with 
(a) Dirichlet and (b) transmission boundary conditions. 

The number of periodizing unknowns were kept fixed, with M = 90, while the number 
of discretization points on the boundary of the obstacle was increased. This experiment 
is designed to illustrate the scaling of the fast direct solver. (Note that for most of the 
N tested, the boundary is over-discretized: for N = 512, the discretized integral equation 
already has accuracy 10~ 8 .) Figure [6721 illustrates the times for the pre-computation and 
the solve steps in the direct solver for the Dirichlet and transmission problems both with 
and without neighbor contributions {P = 1). As expected, since <9f2 is not space-filling and 
uj is small, the times scale linearly with N. Notice that the cost of adding the contributions 
from nearest neighbors is small, since their interaction rank is I = 58. For the Dirichlet 
problem with N = 2 17 = 131072, it takes 95 seconds for the precomputation and 24 seconds 
for the block solve when P = 0, versus 114 seconds for the precomputation and 32 seconds 
for the block solve when P = 1. 

6.2. Comparison of fast direct solver against a fast iterative solver. In this exper- 
iment, we consider a more challenging Dirichlet problem with uj = 30 where the obstacle 
$7 is the complicated domain whose total field is illustrated in Figure 16.31 This domain is 
given by a radial function fit) which is a random Fourier series with 201 terms. The first 
cosine term is set large and negative to give the obstacles the shape of a vertically-oriented 
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N 



(a) (b) 
Figure 6.2. Time in seconds vs N for the (a) Dirichlet and (b) transmis- 
sion boundary value problems. The times for pre-computation are marked 
with o while times for the block solves are reported with □. The dash lines 
correspond to the times with contributions from neighbors while the solid 
corresponds to the times when no neighbor contribution is added. 



ellipse. Its rough surface resembles a dentritic metallic particle. The large number of tight 
curves and close approaches of the boundary with itself is typical of complex geometries, 
and demands a large N to reach any reasonable accuracy. It also causes ill-conditioning 
that demands a large number of GMRES iterations in an iterative solver. 

The grating period is d = 1, corresponding to around 5 wavelengths. The closest distance 
between obstacles is around 0.25. We choose an incident angle corresponding to a Wood's 
anomaly (0 = — arccos(l — 2tt/cj)); note that therefore the problem cannot even be solved 
the standard integral-equation approach based on the quasi-periodic Green's function. We 
take P = 1. In order to obtain an accuracy of 10~ 8 for this problem, 10 5 unknowns are 
needed on the boundary of the obstacle (N) and the number of periodizing unknowns was 
increased to M = 120. 

Two methods were used to solve for the unknown densities. Firstly we tested an itera- 
tive solution of (|2.2ip using standard GMRES without restarts, with the publicly-available 
Helmholtz FMM of Greengard— Gimbutas [19] to apply the matrices in the A block (with 
quadrature corrections near the diagonal where appropriate) , and dense matrix- vector mul- 
tiplication for the other three blocks. Secondly we tested the direct solver scheme presented 
in this work. Note that the FMM is coded in fortran, whereas our direct solver is (apart 
from the interpolative decomposition) in MATLAB. 

The GMRES+FMM solver takes approximately one hour, taking a large number (248) 
of iterations, to solve for the densities. By using the fast direct solver, the densities can 
be found with 4.5 minutes for the pre-computation and 50 seconds for the block solve at 
each a. In other words, for this domain, the direct solver can solve 66 independent incident 
angles in the amount of time it takes the accelerated iterative method to solve for one. 

Note that for the neighbor interactions, we chose 200 points on a proxy circle with 
diameter 1.1 times the vertical height of the obstacle. The approximate interaction rank 
between obstacles is then I = 139. Note a single matrix vector multiply with A -1 takes 
3 seconds. Thus if we were to apply A -1 to one vector at a time in (|3.2j) it would take 
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Figure 6.3. Illustration of the total field at a Wood's anomaly off a collec- 
tion of complicated obstacles. 

3(2M + 2PI) = 1554 seconds. Thus, since the complete block solve takes 50 seconds, 
significant timing gains are seen by applying the A -1 to block matrices instead of single 
vectors. 

Next, we present an example where a large sampling of incident angles are required, as 
would be typical for a solar-cell design problem. An arithmetic series in cos#, with around 
200 values covering the range (—1,1), is considered. Their spacing in cos 9 is 27r/(21 ud), 
which provides roughly 21 q incident angles, where q = ujd/ir « 10. As discussed above, 
because around q angles share the same a, this leads to an additional speed-up of a factor of 
nearly q. It takes 19.1 minutes to solve for the 200 densities (4.1 minutes of pre-computation, 
followed by 15 minutes for the block solves.) Notice that this is around 600 times faster 
than the solution at 200 incident angles would take using GMRES+FMM. 

The resulting fractions of incident flux scattered into each of the Bragg modes (i.e. |c n | 2 
and |<i n | 2 ) are shown, as a function of incident angle 9, in Fig. 16.41 (b). 

6.3. A challenging transmission problem. Prior to the development of the fast direct 
solver, accurately solving a transmission problem on complicated domains required a lot 
of computing time. We applied the direct solver to a transmission problem on the domain 
from section 16.21 With A" = 10 4 and all other parameters as in the previous example (thus 
there are 20241 unknowns), the solver achieves a flux error of 10~ 5 in 6 minutes of pre- 
computation and 14 seconds for each block solve. Figure 16.51 illustrates the total field for 
this example. 

7. Conclusion 

This paper presented a fast direct solution technique for grating scattering problems 
with either Dirichlet or transmission boundary conditions. For low frequency problems on 
simple domains, the computational cost of the solver scales linearly with the number of 
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(a) (b) 

Figure 6.4. (a) The flux errors and (b) the outgoing flux fractions for all 
Bragg modes when solving a Dirichlet problem with 200 incident angles and 
u) = 30. In (b) each flux fraction is shown in a different gray shade, and the 
solid blue line separates reflected from transmitted intensity. 




Figure 6.5. Illustration of the total field at a Wood's anomaly off a collec- 
tion of complicated obstacles with transmission boundary conditions. 

discretization points on one obstacle. The example in section I6.2I illustrates that when the 
obstacle is complicated and the frequency somewhat higher, the direct solver is much faster 
than an FMM accelerated GMRES, because it handles ill-conditioning. Additionally, the 
direct solver is very fast for multiple incident angles that occur often in design problems, and 
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this can be further accelerated in the case when many incident angles share a Bloch phase 
a. In one complicated Dirichlet obstacle case which requires 10 5 unknowns to discretize, 
200 incident angles are solved in under 6 seconds per incident angle. 

Although, for simplicity, we disallowed intersections of d£l with the L and R walls, it 
would be quite simple to adapt the fast direct solver to the scheme presented in [5j Sec. 6] 
to handle this case. It would also be relatively easy to generalize the scheme to multi-layer 
transmission gratings, which are more common in applications. We anticipate creating a 
fast solver for this case in future work. 
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